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Abstract — The spectral efficiency achievable with joint 
processing of pilot and data symbol observations is com- 
pared with that achievable through the conventional (sepa- 
rate) approach of first estimating the channel on the basis of 
the pilot symbols alone, and subsequently detecting the data 
symbols. Studied on the basis of a mutual information lower 
bound, joint processing is found to provide a non-negligible 
advantage relative to separate processing, particularly for 
fast fading. It is shown that, regardless of the fading rate, 
only a very small number of pilot symbols (at most one per 
transmit antenna and per channel coherence interval) should 
be transmitted if joint processing is allowed. 

I. Introduction 

Pilot symbols (a.k.a. training or reference symbols) are 
an inherent part of virtually every wireless system. Mo- 
tivated by this prevalence, the spectral efficiency achiev- 
able when coherently detecting data with the assistance 
of pilots has been the object of much analysis (e.g., [1]- 
[5]). A large fraction of such work has focused on the 
spectral efficiency achievable with Gaussian inputs un- 
der the assumption that the fading channel is estimated 
on the basis of the pilot observations and then, using 
such estimate as it were the true channel, the data is 
detected. Although suboptimal, such separate processing 
reflects the operating conditions of existing systems. 

In this paper, we move beyond this approach and 
quantify the advantage of jointly processing pilot and 
data observations when Gaussian codebooks are uti- 
lized. Since the general mutual information expression 
is intractable, we rely on lower bounds to the achievable 
spectral efficiency. These bounds allow assessing the 
optimum number of pilot symbols under such joint pro- 
cessing, and also quantify the minimum improvement 
in spectral efficiency that joint processing brings about 
relative to separate processing. 

Although there has been prior work on receiver design 
for joint processing (e.g., [6]-[8]), to the best of our 
knowledge there is not yet a general understanding of 
the conditions (in terms of signal-to-noise ratio, fading 
rate, and antenna configurations) in which joint process- 
ing provides a substantial improvement. Given that joint 
processing is more complex than separate processing, 
such a quantification appears very useful. 



As a starting point, a simple block-fading ergodic 
channel model is considered. Section |ll] restricts itself 
to scalar channels, from which many of the insights 
can already be derived. The generalization to MIMO 



(multiple-input multiple-output) follows in Section III 
II. SISO 

A. Channel Model 

Let H represent a discrete-time scalar fading channel. 
Under block Rayleigh-fading, the channel is drawn from 
a zero-mean complex Gaussian distribution at the begin- 
ning of each block and it then remains constant for the T 
symbols composing the block, where T corresponds to 
the coherence time/bandwidth. This process is repeated 
for every block in an IID (independent identically dis- 
tributed) fashion. A total of t pilot symbols are inserted 
within each block leaving T — t symbols available for 
data. 

During the transmission of pilot symbols, 

yp = VsmH + np (1) 

where the received signal, y^, and the noise, rip, are r- 
dimensional vectors. The entries of Up are IID zero-mean 
unit-variance complex Gaussian. The channel satisfies 
E[|i7p] — 1 and thus snr indicates the average signal- 
to-noise ratio. During the transmission of data symbols 



= VSNR Hx + Ud 



(2) 



where y^, n^, and the transmitted data x, are all (T — t)- 
dimensional. The noise rid is independent of Up but 
it abides by the same distribution. As argued in the 
Introduction, the entries of x are IID zero-mean unit- 
variance complex Gaussian. Each transmitted codeword 
spans a large number of fading blocks, which endows 
ergodic quantities with operational meaning. 

B. Perfect CSI 

If the receiver is provided with perfect CSI (channel- 
state information), Gaussian codebooks are capacity- 
achieving and the ergodic capacity, in bits/s/Hz, equals 



(7(snr) 
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(3) 
(4) 



where Ek{-) is the exponential integral of order k. For 
compactness, C(snr) is often abbreviated as C. 

C. Separated Processing of Pilots and Data 

If the receiver uses the pilot observations, y^, to first 
produce an MMSE estimate of the channel, H, and then 
performs nearest-neighbor decoding while treating H as 
if it were H, the maximum spectral efficiency is [5] 



k = 



with 



SNReff = 



SNR (1 — MMSE) 
1 + SNR • MMSE 



(5) 



(6) 



and MMSE = E[\H - H\'^] = 1/(1 + snrt). The maximiza- 
tion in ||5j must be computed numerically as no closed 
form exists. 

D. Spectral Efficiency Lower Bounds for Joint Processing 

In the general case, the receiver decodes the data based 
upon i/p and without any constraints on how these 
observations are used. The per-symbol mutual informa- 
tion I{x;yp,y^) /T is the maximum achievable spectral 
efficiency and is achieved by a maximum-likelihood de- 
coder based on the true channel description p{yp, y^lx). 
Since the expression for this mutual information is in- 
tractable, we instead utilize the following lower boimd. 

Theorem 1 The ergodic spectral efficiency in hits/s/Hz when 
T pilot symbols and {T — r) complex Gaussian data symbols 
are transmitted on every fading block and jointly processed at 
the receiver satisfies 



1 

T 



(7) 



where 



T-T 



and 



^ TJ T l + SNRT ' ^ ' 



Proof: See Appendix A. 

The bound /jj (or, more precisely, its MIMO form 
given in Section [lll| was first derived in [4]. However, it 
was not given as in ijSjl but rather left as an expectation 
over the distribution of x. As shown in the Appendix, 
where we provide an alternative derivation, this expec- 
tation can be expressed in closed form using the results 
of [9]. 

When no pilots are transmitted (r = 0), /jj reduces to 
the bound given for data-only transmission in [10]. 



E. Optimization of Number of Pilot Symbols 

An initial assessment of the optimum number of pilot 
symbols can be made on the basis of /j^, whose max- 
imization w.r.t. r reduces to maximizing the concave 
function log2(l + SNRt) — tC. By relaxing t to a con- 
tinuous value, the optimum number of pilots is 



C 



1 

SNR 



(10) 



which satisfies < t* < 1 . This points to r* being, when 
restricted to integers, either or 1. Furthermore, C < 
log2(l + SNr) (by Jensen's) implying t* = 1. 

In order to sharpen the above assessment, we turn to 
the tighter /j^ and consider the low- and high-power 
regimes separately. In the low-power regime, using 



C ^ log2(e) (SNR - SNR^) + O(SNR^) 



(11) 



and 



SNR 



-1/SNR^^ ( T + ^ ) = SNR - (/C + r) SNr2 + ©(SNR^) 

(12) 

it is found that maximizing Ijj to second order entails 
maximizing the concave function {T — t){T + t — 1). 
Thus, the optimum is again either r = or t = 1. While 
both values yield the same /j^ to second order, an exact 
computation of l|8} reveals that r* = 1 for SNR 0. 
In the high-power regime, using 



ei/^^f^£;i(l/SNR) = log2SNR-7log2e + C' ( — ) (13) 

. SNR , 



ei/SNR£;,(i/sNR) 



1 



O 



SNR 



A; > 1, (14) 



where 7 = 0.5772... is the Euler-Mascheroni constant, it 
is found that 
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(15) 
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Y^e-Ek{l). (16) 



Since e • 25^(1) < 1/k strictly, r = 1 is preferrable over 
T = for SNR 00. (For r > 2, /j^ falls rapidly.) 

Altogether, the optimum number of pilots is r* = 1 in 
both the low- and high-power regimes. Setting r = re- 
sults in a slight loss (quantified in Section pI-G I, whereas 
T > 2 is decidedly suboptimal at moderate /high SNR. 

Extrapolating this result to more realistic continuous- 
fading channels (i.e., the channel varies from symbol- 
to-symbol according to a random process), we can infer 
that, with joint processing, it is desirable to have at most 
roughly one pilot symbol per coherence interval. 
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Fig. 1. Spectral Efficiency vs. T for a SISO channel at SNR = dB and Fig. 2. Power advantage of joint relative to separate processing 
SNR = 10 dB. The curves correspond to C, Is and Ij^ (with r = 1). asymptotically (SNR oo) and at SNR = 10 dB and SNR = 20 dB. 



F. Comparison with Separate Processing of Pilots and Data 

The value of joint processing is illustrated by examin- 
ing how the spectral efficiency converges to the perfect- 
CSI capacity as the blocklength T increases. From (|9}, 
the difference between C and Ij^ is 



C 



= o 



T 
T 



logs 



1 + snrT 

1 + SNRT 



(17) 
(18) 



for any fixed value of t. On the other hand, the difference 
between C and the spectral efficiency achievable with 
separate processing, 1$, vanishes only as 0{1/Vt) [11]. 
This contrast is evidenced in Fig. [T] 

With joint processing, as T grows the spectral effi- 
ciency converges to C even though r is fixed because the 
(possibly implicit) channel estimation process can take 
advantage of the data symbols. On the other hand, if 
T were kept fixed the spectral efficiency of the separate 
approach would not converge to C; 1$ converges to C 
only because r is properly increased, as per with T. 

G. High-Power Behavior 

Further insight is obtained by studying the high- 
power behavior of the various bounds. At high SNR, and 
for T = 1, the lower bounds converge absolutely to 



T- 1 
T 

T-1 



C 



C - 



T- 1 



T \ T -\ 
while, with separate processing [3], 



T - 1 
T 



(19) 



(20) 



(21) 



All the above quantities have the same pre-log factor, 
(T — l)/r, and thus the difference between the terms 



inside the brackets directly gives the power penalty 
relative to the perfect-CSI capacity, i.e., the horizontal 
shift in a plot of spectral efficiency vs. snr (dB). When 
the information units are bits, this horizontal shift is in 
3-dB units [12]. 

The asymptotic difference between /j^ and /j^ 
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(22) 



in 3-dB imits. This quantity decreases with T and is 
minute even for small values of T (e.g., 0.02 dB for 
T = 10) and thus, at high SNR, we can consider the 
simpler /jj with only a negligible loss in accuracy. 

Based on Ij^ then, the asymptotic power advantage of 
joint processing relative to separate is 



1 



logsT 



(23) 



T-l 

in 3-dB units. In Fig. |2j this quantity is plotted versus 
T, along with the numerically computed advantage at 
SNR = 10 dB and SNR — 20 dB. (The difference between 
the respective curves indicates that the convergence of 
Is to its asymptote occurs ever more slowly as T grows.) 

Using /jj and | [T3) , it is also straightforward to com- 
pute the high-power advantage of transmitting one pilot 
symbol (r = 1) rather than none (r — 0) as 



TlogsS 
T 



(24) 



in 3-dB units. For short blocks the single pilot is useful, 
but for larger blocklengths it makes little difference. 

Finally, we can also quantify the distance to the true 
capacity of the block-fading channel. In [13], such capac- 
ity (indicated by C to distinguish it from C, the capacity 
with perfect CSI) is shown to converge, for snr oo, to 



C 



T- 1 
T 
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(25) 



Using Stirling's approximation. 
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(26) 



for large snr, coinciding with the high-SNR expansion 
of Ijj save for the factor 1/2. This indicates that the 
spectral efficiency with joint processing scales with the 
blocklength T in the same manner as the true capacity 
in the high-power regime. Furthermore, the power offset 
between /j^ and the true capacity is only (approximately) 

1 log2 T 

5^ (27) 

2 r- 1 ^ ' 

in 3-dB units. This evaluates, for instance, to 0.55 dB and 
0.1 dB for T = 10 and T = 100, respectively 

III. Generalization TO MIMO 

A. Channel Model 

With tit transmit and tir receive antennas, the SISO 
input-output relationships in Q and ||2j become 

/snr , 




HP + N^ 



HX + Nd 



(28) 
(29) 



where H, P, X, Np and are, respectively, x rirj., 
tit X t, X (T — t), Mr X r and x (T — r). Matrices 
i?, X, Np and TVd have IID zero-mean imit-variance 
complex Gaussian entries while P must satisfy power 
constraint Tr{PP^} < UtT. 

B. Perfect CSI 

For notational convenience, define Ct,r as the function 

Ct.Ap) = E log2 det (l + ^ ZZ^) (30) 

where Z is an r x i matrix with IID zero-mean unit- 
variance complex Gaussian entries. The MIMO perfect- 
CSI capacity with transmit and n„ receive antennas 
at SNR equals C„^^„r(snr). 

C. Separated Processing of Pilots and Data 

The SISO expressions for Is in Section |II-C apply 
verbatim with T, r, and C(-) replaced, respectively, by 

T/ut, f = t/ut., and C„^^„r(-)- 

D. Spectral Efficiency Lower Bounds for Joint Processing 

In the MIMO case, we allow for the possibility of either 
no pilot symbols (r = 0) or of at least one pilot symbol 
per antenna (t > n^). 

Theorem 2 Let t = or t > riT. The ergodic spectral 
efficiency in bits/s/Hz when r pilot symbols and (T — t) 
complex Gaussian data symbols are transmitted on every 
fading block and jointly processed at the receiver satisfies 

\l{X-Yp,Yd)>h,>h, (31) 



where 



(T \ Tl I SNR 

1 ^ y j C'nT,"R(SNR) - Cm^T-T ^ SNR^ 



(32) 



and 



Proof: See Appendix B. 



^TriR,„„. /1 + SNR;^ 
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(33) 



As a by-product of the proof, we show that /jj is 
maximized when the pilot matrix P satisfies 

PP^ = rl (34) 

which coincides with the optimality condition derived 
in [3] for the case of separate processing. 

Henceforth, we shall focus on the case — n^. 



Corollary 1 If rir^ ~ — n, then 

Ih f , fhf \ C„,„(snr) 1 

T/n 



1 



/l + SNRT/n 

T/^ VI + SNR t/?1 



(35) 



which coincides with its SISO counterpart in ([Sj only with 
an effective fading blocklength of T/n, an effective number of 
pilot symbols of r/n, and C replaced by Cn,n/n. 

E. Optimization of Number of Pilot Symbols 

In the low-power regime, the number of pilot symbols 
can be optimized on the basis of /j^ . Using 



t + r 

CtAp) = ''log2(e) ( p- 



0{p') (36) 



it is found that maximizing /jj to second order requires 
maximizing the concave function (T — t){T + t — tt-r). 
This implies that either r = or r = ?i is optimal, and 
the two are indistinguishable to second order. 

Drawing parallels with its SISO counterpart, the max- 
imization of /jj w.r.t. to r is equivalent to the maximiza- 
tion of log2 (1 + SNRr) — rCn.n/n w.r.t r — r/n. Hence, 

r^ = ^-— (37) 

Cn.n/n SNR 

if T is relaxed to continuous values. This quantity is 
below unity whenever Cn.n/n > log2 e, which implies 
that the optimum number of pilots is either or rt. Since 
Cn.n/n < log2(l + snr), t = 71 is preferred over r = 0. 

F. High-Power Behavior 

Because Ij^ and 1$ mirror their SISO counterparts, 
the asymptotic power advantage (in 3-dB units) of joint 
relative to separate processing for MIMO is the SISO 
advantage for an effective blocklength of T/n, i.e., 

, log2(r/n) 



T/n-1 



(38) 



Appendix A 

By the chain rule, the mutual information with perfect 
receiver knowledge of H expands as I{x;yp,y^, H) = 
I{x;yp,yd) + I{x;H\yp,y^). Thus, 

yp, yd) = l{x; yp, yd,H)- l{x- H\yp, (39) 

= yp, yd,H)- h{H\yp, y^) 

+hiH\yp,y,,x) (40) 

> I{x;yp,y^,H)-h{H\yp) 

+h{H\yp,y^,x) (41) 



where h{-) denotes differential entropy and ( |4T) holds 
because conditioning reduces entropy. 

The signal-to-noise ratio when estimating H on the 
basis of yp is SNRr. Thus, H\yp is conditionally Gaussian 
with variance 1/(1 + SNR r) and therefore 



HH\yp) = log2(7re) - log2 (1 + SNRt) . 



(42) 



In turn, the signal-to-noise ratio when estimating H 
on the basis of (yp,yd)' conditioned on x^, is snrt + 



SNR^^^j^ \xk\'^ and thus 
h{H\yp,y^,x) - -E 



T-T 



log2 1 + SNRT + SNR ^ |xfcp 
\ k=l 

+ log2(7re). (43) 



Using I{x- yp,y„H)^ (T^t) C, plugging ||42f and (|43l 
into (|4T|, and scaling all the terms by l/T, 



TJ T 



1 + SNRr 



l0g2 



(44) 

A closed form for the expectation in 1 44 1 is given in [9], 
leading directly to (|8j. 

The subsequent lower bound, /j,, follows from appli- 
cation of Jensen's inequality to |44 i. Since E[|j:fep] = 1, 



E 



l0g2 1 



1 + SNRr 



< log2 1 



SNR (T 



1 + SNRr 



(45) 



Appendix B 



Starting at |41l, we need only compute /i(H|yp) and 
/i(H|yp, j/j, a;). Because the antennas are decoupled 
when conditioned on either j/p or (yp, y^,x), these terms 
can be evaluated separately for each receive antenna. 
From [3], the covariances of one row of H conditioned 
on yp and on {yp,y^,x), respectively, are 



K 



K 



ii\y^,y,.x 



SNR 
SNR 



ppt 



(46) 
(47) 



Defining A = h{H.\yp) - h{li\yp,y^,x), we have 



A = rinE 



logdetKniy ,y_j,a; - Jin logdetKniy (48) 



logdct| I + ( 1+ ^PP^ 



SNR 



XX^ 



(49) 



To obtain /jj we must find the pilot sequence P that 
minimizes | [49| . This amounts to choosing the worst- 
case noise covariance when the input and the channel 
are both spatially white. Since the distribution of X is 
rotationally invariant, we need only consider diagonal 
forms for PP^^. To show that the best choice is PP^^ — rl, 
we apply the argument in [14, Sec. 4.1] to the function 
in 149), which is convex w.rt. PP^. With PP^ = rl. 



A = tirE 



log dct I 



SNR 



1 + SNR- 



-XX^ 
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1 + SNR — 

7lT 



(50) 



(51) 



/j2 is reached by applying Jensen's inequality to 1 50 1. 
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